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ABSTRACT 

Here, we describe the development of 
WikiPathways (http://www.wikipathways.org), a 
public wiki for pathway curation, since it was first 
published in 2008. New features are discussed, as 
well as developments in the community of contribu- 
tors. New features include a zoomable pathway 
viewer, support for pathway ontology annotations, 
the ability to mark pathways as private for a limited 
time and the availability of stable hyperlinks to 
pathways and the elements therein. WikiPathways 
content is freely available in a variety of formats 
such as the BioPAX standard, and the content is 
increasingly adopted by external databases and 
tools, including Wikipedia. A recent development is 
the use of WikiPathways as a staging ground for 
centrally curated databases such as Reactome. 
WikiPathways is seeing steady growth in the 
number of users, page views and edits for each 
pathway. To assess whether the community 
curation experiment can be considered successful, 
here we analyze the relation between use and con- 
tribution, which gives results in line with other wiki 
projects. The novel use of pathway pages as sup- 
plementary material to publications, as well as the 
addition of tailored content for research domains, 
is expected to stimulate growth further. 

INTRODUCTION 

WikiPathways (http://www.wikipathways.org) is a 
resource for biological pathways in the form of a wiki. 
It serves as a repository for biological knowledge in the 



form of pathway diagrams and as platform for curating, 
sharing and publishing pathways. We launched 
WikiPathways in 2008 as an experiment in community- 
based curation of biological pathways (1). WikiPathways 
has continued to develop and has been adopted by the 
research community in several ways. 

Here, we present the latest developments for 
WikiPathways focusing on two specific aspects. In the 
first part, we highlight new features developed in recent 
years, and show how they fit with our general philosophy 
of community curation and collaboration. In the second 
half, we analyze the size and activity of the community to 
assess the success of WikiPathways so far. 

New features of WikiPathways 

Pathway diagrams are found everywhere: in textbooks, in 
review articles, on posters and on whiteboards. Their 
utihty is to turn abstract knowledge into an understand- 
able visualization. WikiPathways enables biologists to 
capture their rich, intuitive mental models of biological 
pathways and avoid the incomprehensible 'hairball' state 
of networks derived from big data. 

From this, understanding of how pathways could be 
used, we derived a number of assumptions that have 
guided the addition of new features. The first assumption 
is that manual curation leads to higher quality pathways 
in the long term. Consequently, we developed features that 
make manual interaction with the site easier. Our second 
assumption is that the success of WikiPathways is depend- 
ent on the community of curators, and thus we have added 
features to stimulate growth of the community and 
avoided features that increase the barrier to entrance for 
newcomers. Finally, it is our goal for WikiPathways to be 
a public resource for biological research. The content 
should be easily accessible for a wide range of 
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applications, including those that we support directly, as 
weU as those that we have not envisioned ourselves. 

New developments in the pathway page 

The most visible aspect of WikiPathways is the pathway 
page. Each pathway has a dedicated page that provides 
information on a specific biological mechanism, including 
the pathway diagram, description, hyperhnks to detailed 
information about genes, proteins and metabolites, and 
relevant literature references. Entities such as genes, 
proteins or metabohtes in a pathway can be annotated 
with many different identifier systems, such as Ensembl 
(2), Entrez Gene (3) or ChEBI (4). Hyperhnks are 
provided to many different external sources of informa- 
tion, such as genome browsers, experimental platforms, 
protein databases. Gene Ontology and Wikipedia. These 
are directly available by clicking on a gene or protein box 
on the pathway page and allow the researcher to browse to 
detailed information related to components of the 
pathway. This way, the pathway page provides an 
organized summary of information related to a biological 
mechanism, and provides a starting point for researchers 
to browse through more specific and detailed resources. 

Earlier versions of the pathway page contained a static 
image of the pathway, without interactive access to hnked 
databases. Zooming in was possible only after opening the 
editor applet. Aiming to increase the manual interaction 
with the pathway page, we replaced the static pathway 
image with an interactive pathway viewer (Figure 1). 
This interactive viewer makes it possible to zoom and 
pan the diagram and click on elements in the pathway 
to view detailed information, similar to the popular 
Google maps interface. Genes, proteins and metabolites 
can be clicked for direct access to external databases. The 



interactive viewer also contains a search function that 
makes it easier to locate elements on the pathway diagram. 

Organizing pathway information 

Currently, WikiPathways contain over 1600 pathways and 
supports 21 species, including vertebrates, several plant 
species, bacteria and model organisms such as worm, 
yeast and fruit fly. To support manual interaction with 
this growing amount of knowledge, we attempted to 
make it easier to organize and browse. Therefore, we 
recently implemented an ontology annotation feature 
that allows each pathway to be annotated with terms 
from several ontologies covering different topics [these 
currently include the Pathway, Human disease and Cell 
type ontologies from the BioPortal collection (5)]. This 
allows users to exactly specify the context that applies to 
the pathway, e.g. a specific disease, organ or ceU type. By 
using the pathway ontology, a hierarchical organization of 
the pathways can be created that groups smaller, more 
specific pathways (e.g. oxidative phosphorylation) into 
larger superpathways (e.g. energy metabohsm). These an- 
notations make it easier to find pathways related to a 
specific topic, by aUowing users to search pathways by 
ontology terms. It will also facihtate data analysis and 
visualization methods, for example, a pathway interaction 
network could be built from the hierarchical organization 
in the pathway ontology to analyze biological mechanisms 
at different levels of detail. 

Pathways as a public resource 

WikiPathways is a public resource intended to benefit 
biological research worldwide. All content is freely 
available under the Creative Commons attribution 
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Figure 1. The new interactive pathway viewer. 
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license (6). Because of this license, there is no limit to re- 
distribution of pathway content. The WikiPathways 
content is already accessible from several external 
resources, such as NCBI BioSystems (7) and BioGPS 
(8). We are continually looking for new ways to make 
pathway information available to a wider public. For 
example, we recently added Interactive Pathway Maps 
into human gene and pathway-related articles 
at Wikipedia (9). Implemented as reusable templates, 
these interactive pathway maps link related Gene Wiki 
(10) articles. Each of these pathways at Wikipedia is 
viewed tens of thousands of times per day. We were 
careful not to add flagrant links to WikiPathways. 
Clicking on pathway at Wikipedia, for example, 
keeps you at Wikipedia. Nevertheless, we witnessed a 
38% increase in new users and Wikipedia is consistently 
one of our top referring sites. Our pathway template 
passed through the Molecular and Cellular Biology 
Wikiproject proposal process and was vetted in accord- 
ance to rigorous guidehnes by a number of 
'Wikipedians' who specialize in biology-related content. 
WikiPathways content can now be deployed as 'stub' 
articles to initiate new articles around important biology 
topics. 

Each pathway in WikiPathways carries an identifier of 
the form 'WP1234' (where 1234 can be a number of any 
size), which enables stable URLs to be formed to link 
directly to a pathway. This simple yet effective feature is 
important for online collaboration, as a URL can be 
shared via email. We recently added support for linking 
directly to genes, proteins or metabohtes in the pathway. 
The linked element is opened in the new interactive 
pathway viewer and highlighted. This way users can be 
directly pointed to an element of interest, for example, 
from a search result or external resource. The pathway 
viewer can also be included in any website as a widget, 
for example to share a pathway of interest via a blog 
post, or to pubHsh a pathway on the project page of a 
research group. These new ways of hnking to pathway 
content serve to increase the cohesion of WikiPathways 
with other online biology and bioinformatics resources. 

In some use cases, it is not desirable that a new pathway 
is immediately publicly available. For example, when the 
pathway is used as supplementary material to a still 
unpubhshed manuscript, it should only be visible to a 
limited number of users. Therefore, we implemented the 
possibility to create a pathway, but postpone its pubhca- 
tion by temporarily marking it as private and thereby 
hiding it from public view. The pathway author can then 
set permissions for specific user accounts, for example, to 
allow only collaborators to view and edit the pathway. 
This way authors can retrieve a stable identifier that can 
be used to send as URL to collaborators or added as ref- 
erence in a manuscript and allow referees to access the 
pathway during the peer review process. By default, the 
pathway will automatically become public after 1 
month, but the author can actively postpone this 
deadhne each month. By requiring a periodic action to 
prevent the pathway from becoming pubhc, we expect 
that all private pathway information will eventually 



become publicly available to the WikiPathways 
community. 

Pathways in bioinformatics applications 

Pathways produced at WikiPathways are in a format that 
can be directly used in downstream data analysis by a 
number of software tools. Thus, we complete a cycle 
starting with researcher knowledge that when synthesized 
with standardized data, leads to novel pathway models 
that can be used to visualize and analyze other data sets, 
leading to new insights, experiments and knowledge. 

WikiPathways content is distributed through numerous 
onhne resources and bioinformatics software packages. 
We provide pathways in an open, XML standard 
format, called GPML, which is explicitly compatible 
with a handful of analysis tools, such as GenMAPP 
(11), PathVisio (12), Cytoscape (13) and GO-Elite (14). 
These tools support various workflows involving visual- 
ization and analysis of experimental data. The GPML 
format can be made compatible with any tool that 
chooses to use it since it is cross-platform, open and 
actively supported. For an even broader audience, we 
provide our pathways in BioPAX (15) format as 
well. This for example allows integration of the content 
into pathway unification efforts such as Pathway 
Commons (16). 

Pathway information is also avaflable through our open 
web service API, providing access to WikiPathways 
content to a broad spectrum of software developers (17). 
This web service processed over 45,000 requests by 
external scripts per month over 2010. It can be used to 
integrate pathway information directly from 
WikiPathways into scripts, data analysis workflows or 
external tools. An example of a web application that 
uses the WikiPathways web service for pathway analysis 
is WebGestalt (18), which allows researchers to find 
over-represented pathways from a user-specified input 
hst. An example of a locally installed tool that integrates 
WikiPathways content via the web service is 
DomainGraph (19), a plugin for the network analysis 
tool Cytoscape that can be used to visualize 
alternative-sphcing data. 

Growth and activity of the WikiPathways community 

The goal of WikiPathways is to capture knowledge about 
biological pathways (the elements, their interactions and 
layout) in a form that is both human readable and 
amenable to computational analysis. Curating and main- 
taining a public collection of pathways is a large and 
never-ending task, as a continuous stream of new know- 
ledge is being generated. Given the current fast growth of 
knowledge, centralized curation as employed in existing 
pathway resources may not scale well (20). Wikis 
provide an effective platform for community-based 
curation that may be better scalable since users can 
directly contribute, update and expand the content. 
Given a large and active enough group of users, this 
improves comprehensiveness and quahty of the con- 
tent in the long term. The mechanism behind this 
can be described as a positive feedback loop (21). 
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Figure 2. Size and growth of the human pathway archive before and 
after launching WikiPathways. The numbers for years 2005-2008 are 
derived from the size of the GenMAPP archive, which served as the 
initial seed content for WikiPathways. 



The wiki starts with some initial content which attracts 
users. A small fraction of these users will also correct or 
extend the content, which on average leads to an improve- 
ment of the quahty and usefulness of the content. This will 
attract more users, thereby also increasing the number of 
potential editors, which improves the content even more. 
Based on the statistics gathered during the short history of 
WikiPathways, it might be possible to obtain more insight 
into how well the wiki approach has worked so far for 
biological pathways. 

If a relation would exist between usage and contribution 
in WikiPathways, the number of edits to a pathway is 
expected to increase with the number of views. Indeed, 
pathways that have many views are also among the ones 
with the most edits and have also been edited by a larger 
group of authors (Figure 3). Following the positive 
feedback mechanism, usage would also result in a 
growth of content. Since the number of site visits has 
increased from over 1100 per month over the 3 months 
prior to publication to almost 5400 per month over the 
last 3 months of 2010, a growth and improvement of 
content would also be expected. Indeed, compared to 
the initial content of WikiPathways, the number of 
human pathways has grown by 128% and the number 
of annotated human genes in these pathways has increased 
with 30% (Figure 2). To put this number in perspective, 
the pathway collection for GenMAPP, on which the initial 
WikiPathways content was based, grew with only 1% in 
number of pathways and 5% in number of genes in the 
3 years preceding WikiPathways. 

In addition to the usage and content growth of 
WikiPathways, the size and activity of its community 
have also grown. Since January 2008, WikiPathways 
went from 100 to over 1800 registered users with an 
increasing percentage of members creating and editing 
pathways. In 2008, on average 10 users per month made 
one or more edits to a pathway and 87 edits were made per 
month. These numbers have grown to on average 
16 editing users and 261 edits per month over 2010, a 
growth of 56 and 200%, respectively. The barrier to 
contribute still seems fairly high, since on average only 
0.36% of the website visitors actually edited a pathway 
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Figure 3. Relation between views and edits for the pathways at 
WikiPathways. Each dot in the graph represents a group of pathways 
that each has been edited by the same number of unique authors 
(indicated by color). The size of the dot represents the number of 
pathways in that group. The x-axis represents the number of times 
the pathway has been viewed (averaged over each group) and the 
_v-axis represents the number of times the pathway has been edited 
(averaged over each group). 



one or more times. However, when compared to 
Wikipedia, a wiki with a very active community, these 
numbers seem more reasonable. For the Enghsh 
Wikipedia, only 0.02-0.03% of the visitors are active con- 
tributors [defined as at least five edits in a given month 
(22)] and for WikiPathways this translates to almost 
0.19% averaged over 2010. However, in contrast to 
Wikipedia, the content at WikiPathways is focused on a 
smaller domain and a large part of the target audience is 
expert in this domain. Therefore, it is worth trying to 
lower this barrier even further. 

Although the active coinmunity shows growth, it 
remains relatively small, probably too small to effectively 
keep up with the growth of biological knowledge that can 
be captured in pathways. The size and activity of the com- 
munity can be improved in two main ways that reinforce 
each other. First, the portion of users that become con- 
tributors can be improved to increase the activity of the 
community. Second, improving usabihty of the pathways 
will increase the size of the community, since the number 
of users is proportional to the utility of the content. The 
following sections will highhght several use cases that we 
are supporting and actively stimulating to grow the active 
community and improve usability of content. 

Pathway publishing 

Pathway diagrams are widely used in scientific publica- 
tions as figures accompanying the text. These figures are 
being translated into annotated pathways in digital 
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form; however, this is a rather tedious and often error 
prone task. Using WikiPathways, authors can directly 
create a fully annotated pathway and either save it 
supplementary material. Several authors have already 
taken this approach (23,24), which offers additional ad- 
vantages from the author's point of view. First, the 
pathway page at WikiPathways improves the reader ex- 
perience, by providing hnks to additional detailed infor- 
mation and unambiguously annotated genes, proteins and 
metabohtes. In contrast, when using a static image, 
manual searching is required to find more information 
about a protein on the diagram and ambiguous protein 
names may lead to incorrect interpretations. Secondly, the 
onHne pathway improves visibihty and usabihty of the 
author's results, by making it searchable via internet 
search engines such as Google and by redistributing it 
with several bioinforniatics tools for use in data analyses 
of other researchers. When such analyses lead to new pub- 
hcations, the original source of the pathway can be 
tracked and cited, thereby increasing its measurable 
impact. Once at WikiPathways, the pathway can be 
updated by both the original author and the community 
based on new research results that may become available 
over time to provide a better representation of the 
biological mechanism. The new private pathways feature 
has increased the usability of WikiPathways as publishing 
tool and we hope this encourages more authors to publish 
their pathway diagram as supplementary material at 
WikiPathways. 

Curated pathway diagrams describing novel findings 
and perspectives can also be published as posters, presen- 
tations and initial research reports at Nature Precedings. 
We are estabhshing a WikiPathways Collection at Nature 
Precedings through which we will encourage, collect and 
promote publications from the community. This is an at- 
tractive and innovative publishing route for pathway 
curation efforts that are not yet associated with tradition- 
ally defined 'pubhshable' results. 



Building communities 

The explosion of wikis and social curation tools in the 
biological sciences is a testament to the demand for such 
involvement across a wide array of subdisciplines (e.g. 
model organisms, genes, SNPs, structures). These 
communities were not created by wiki tools (anyone who 
has tried to start a wiki knows that the statement 'build it 
and they will come' does not apply). Rather, a wiki simply 
enables and gives coherence to a community that already 
existed, but had yet to find each other. Thus, the real in- 
novation of WikiPathways is not necessarily the wiki tech- 
nology, but rather the fact that we revealed this potential 
community of pathway curators. This innovation is 
changing the definition of what a pathway resource is 
and does. A main focus for pathway databases has been 
to collect and curate a set of canonical pathways. But, 
more recently, we are seeing many WikiPathways con- 
tributors focus on content that is tailored to a particular 
research perspective. Thus, we are breaking the 'canonical' 
mold. For example, pathways related to topics such as 



pluripotency, heart development, miRNA, addiction, 
SIDS, ossification, aflatoxin were contributed to 
WikiPathways, which are typically not considered 
among canonical pathways. In the next phase of growth, 
our innovation will be to enable not merely a curation 
community, but an expanding collection of curation 
communities, each with their own research interests. If 
we properly capitalize on this innovation, our distributed 
model could experience exponential growth, not possible 
by a traditional resource, while simultaneously increasing 
quality. 

WikiPathways has already been used by specific curation 
communities to build and maintain a set of focused 
pathways supporting ongoing projects or research collab- 
orations. For example, the Micronutrient Genomics 
project (25) aims to provide a resource for knowledge on 
the biological context of micronutrients and uses 
WikiPathways to collaboratively edit a subset of 
pathways related to these topics. A core team of experts 
builds pathways and streamhnes contributions from the 
community. Along the same lines, the Cahfornia Institute 
for Regenerative Medicine (CIRM) has adopted 
WikiPathways to highhght a subset of pathways 
contributed by the stem cell research community. For ini- 
tiatives like these, WikiPathways provides the option to 
create a portal page, which provides an access point to 
the subset of pathways and can be customized with the 
project logo and announcements. Such portals make the 
content more attractive for a specific group of users 
because it provides a more convenient entry point that is 
focused on their research subject. 

Supporting centrally curated databases 

WikiPathways also provides a framework for collecting 
community contributions for centrally curated databases. 
In this setup, the content of the database is mirrored 
on WikiPathways which provides a medium for its users 
to contribute corrections or additions. This way, 
WikiPathways complements centrally curated databases 
by providing a staging ground for new content that can 
then be reviewed by appointed curators for inclusion in 
the database. The Reactome database is currently in the 
process of setting up this workflow using WikiPathways to 
improve their ability to collect community contributions 
(26). A similar approach at smaller scale is taken by the 
maintainers of the PluriNetwork (27), an electronic 
resource for curated protein interactions relevant to 
pluripotency, for which a version is maintained at 
WikiPathways that aUows users to contribute new inter- 
actions. As another example, the NetPath cancer and 
immune signaling pathways are also maintained at 
WikiPathways to complement their focused, more 
stringent, system of curation (28). Partnering with other 
pathway or interaction resources directly increases the 
community and contribution rate of WikiPathways. 
In addition, it increases database inter-compatibility and 
makes each contribution from the community more 
valuable, because it wiU eventually be distributed over dif- 
ferent pathway resources. 
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Future plans 

Improving the curation experience. We aim to improve the 
curator tools by bringing relevant data to the curator. For 
example, if the pathway already contains a given gene, we 
could provide the curator with known interactions n-degrees 
from that gene, or other genes related by functional annota- 
tion. The benefits to the curator are 2-fold. First, they have 
access to relevant snippets of data that are otherwise buried in 
various databases. And, second, they can copy the snippet 
into the pathway they are editing and maintain the relation- 
ships and annotations from the source database, including 
evidence codes and literature references. This way both ease 
and quahty of user contributions can be improved. 

Structured pathway content. The anticipated growth of 
WikiPathways necessitates a forward-looking approach to 
data storage and representation. There are three major con- 
siderations: scalability, accommodating constant and 
dynamic data changes, and the ability to support complex 
queries. Complex queries are an essential consideration 
because WikiPathways users will need more than basic 
'select' and 'join' query access. They will need to be able to 
query across multiple resources and levels, essentially access- 
ing various, dynamically generated 'super pathways'. This 
goes well beyond the capabilities of our current web service 
API and MySQL database solutions. Therefore, we aim to 
make WikiPathways content more accessible and connected 
through semantic technologies. We will extend 
WikiPathways with customized semantic components and 
derive triples from our structured GPML content, inferred 
pathway information, pathway metadata and selected 
external content. By periodically synchronizing our 
semantic data with major biological data repositories, our 
content can be effectively connected with these massive and 
growing collections. This way, our data will be accessible to 
growing numbers of semantic tools for advanced search, data 
integration and bioinformatics analysis. 

Public data integration. To enhance the usability of the 
pathway information on the WikiPathways website, we 
aim to support direct integration of publicly available data 
and allow the user to customize the information content dis- 
played on a given pathway page. For example, we could 
directly map reference gene expression data, for example to 
visualize the expression level of the genes in a pathway in a 
given tissue. Or by querying gene-disease associations, we 
could display a ranked Ust of potentially relevant disease 
terms per pathway. This way, WikiPathways may actually 
be serving as a high-level knowledge management tool, 
providing researchers and domain experts access to related 
snippets of data from disparate resources, enabhng them to 
annotate and qualify new connections in the context of 
biological pathways, and ultimately producing novel 
snippets of data for future reuse. 



CONCLUSION 

We presented new features of WikiPathways that improve 
both usabihty and curation of pathway content. The 
growth in content active community indicates that 



WikiPathways is being adopted by the research commu- 
nity as pathway resource as well as a framework for pub- 
Hshing and curating biological knowledge. Eventually, 
both usage and contribution need to reach a critical 
mass to establish a stable active community that can 
keep up with the continuously growing amount of know- 
ledge and publications. The open nature of the 
WikiPathways project (both in content, code and collab- 
orations) allows different user groups to adapt and imple- 
ment features to support a specific use case, and stimulates 
new communities to adapt WikiPathways for their 
research. In the end, we hope this will contribute to a 
better representation of our knowledge as biological 
pathways, and contribute to improving exploratory 
pathway analysis. 
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